For our exploratory analysis, we used datasets from the Center for Disease Control and Prevention and a dataset collected by the Bureau of Labor Statistics (BLS) Local Area Unemployment Statistics (LAUS) program. We decided to focus on the relationship between economic status versus cancer deaths per case and cases per population. We also focus our analysis in the West Coast of the United States: Washington, Oregon, and California.
The Box plot below shows the age adjusted rate of new cancer cases based on sex/gender of a person. The box plot takes into account the data of all three of the west coast states that our group will be analyzing. At first glance, it is obvious that there seems to be a lot more cases within the male population than the female population. The q1 difference between male and female is about 53 cases per ???. The q3 difference is about 77 cases per ???. It seems that the body
sex_dist
race_dist
For the bubble plot below, we compared the median household income in 2019 in Washington, Oregon, and California to the age adjusted death rates to understand the relationship that economic status has on the mortality rate from cancer. We used the gapminder and plotly packages to color code the states and visualize this relationship. From the visualization, we noticed higher income counties in California had a lower mortality rate than lower income counties. This may be due to the availability of better hospital’s with cancer treatments within higher income counties.
The map below displays the age-adjusted rate of new cancers. We used the map functionality within ggplot to retrieve the longitudes and longitudes for the three states we were focusing on, and we merged the geographical dataset and the original comprehensive dataset to generate a choropleth map that displays lighter colors for a greater age-adjusted case rate.